Catalogue Search | MBRL

Web scraping with Python : collecting more data from the modern web

by Mitchell, Ryan E., author in Python (Computer program language) , Data mining. , Automatic data collection systems.

If programming is magic then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. The expanded edition of this practical book not only introduces you web scraping, but also serves as a comprehensive guide to scraping almost every type of data from the modern web. Part I focuses on web scraping mechanics: using Python to request information from a web server, performing basic handling of the server's response, and interacting with sites in an automated fashion. Part II explores a variety of more specific tools and applications to fit any web scraping scenario you're likely to encounter.

Book

Share this book

Add to My Shelf

Ensuring the quality and specificity of preregistrations

by Bakker, Marjan , Veldkamp, Coosje L S , Crompvoets, Elise A V in Behavioral sciences , Biology and Life Sciences , Clinical trials

2020

Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of \"researcher degrees of freedom\" aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false-positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared 2 formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge Registration (now called \"OSF Preregistration,\" http://osf.io/prereg/). The Prereg Challenge format was a \"structured\" workflow with detailed instructions and an independent review to confirm completeness; the \"Standard\" format was \"unstructured\" with minimal direct guidance to give researchers flexibility for what to prespecify. Results of comparing random samples of 53 preregistrations from each format indicate that the \"structured\" format restricted the opportunistic use of researcher degrees of freedom better (Cliff's Delta = 0.49) than the \"unstructured\" format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.

Journal Article

Share this book

Add to My Shelf

Practical web scraping for data science : best practices and examples with Python

by Broucke, Seppe vanden, 1986- author , Baesens, Bart, author in Python (Computer program language) , Data mining. , Automatic data collection systems.

This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist's arsenal, as many data science projects start by obtaining an appropriate data set.

Book

Share this book

Add to My Shelf

On the responsible use of digital data to tackle the COVID-19 pandemic

by Vayena, Effy , Ienca, Marcello in Betacoronavirus , Big Data , Cell Phone

2020

Large-scale collection of data could help curb the COVID-19 pandemic, but it should not neglect privacy and public trust. Best practices should be identified to maintain responsible data-collection and data-processing standards at a global scale.

Journal Article

Share this book

Add to My Shelf

Website scraping with Python: using BeautifulSoup and Scrapy

by Hajba, Gâabor Lâaszlâo, author in Python (Computer program language) , Downloading of data. , Information retrieval.

2018

\"Closely examine website scraping and data processing: the technique of extracting data from websites in a format suitable for further analysis. You'll review which tools to use, and compare their features and efficiency. Focusing on BeautifulSoup4 and Scrapy, this concise, focused book highlights common problems and suggests solutions that readers can implement on their own. Website Scraping with Python starts by introducing and installing the scraping tools and explaining the features of the full application that readers will build throughout the book. You'll see how to use BeautifulSoup4 and Scrapy individually or together to achieve the desired results. Because many sites use JavaScript, you'll also employ Selenium with a browser emulator to render these sites and make them ready for scraping. By the end of this book, you'll have a complete scraping application to use and rewrite to suit your needs. As a bonus, the author shows you options of how to deploy your spiders into the Cloud to leverage your computer from long-running scraping tasks\"--Back cover.

Book

Share this book

Add to My Shelf

Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool

by Robinson, Jonathan , Rosenzweig, Cheskie , Litman, Leib in Adult , Behavioral Research - methods , Behavioral Research - standards

2019

Mechanical Turk (MTurk) is a common source of research participants within the academic community. Despite MTurk's utility and benefits over traditional subject pools some researchers have questioned whether it is sustainable. Specifically, some have asked whether MTurk workers are too familiar with manipulations and measures common in the social sciences, the result of many researchers relying on the same small participant pool. Here, we show that concerns about non-naivete on MTurk are due less to the MTurk platform itself and more to the way researchers use the platform. Specifically, we find that there are at least 250,000 MTurk workers worldwide and that a large majority of US workers are new to the platform each year and therefore relatively inexperienced as research participants. We describe how inexperienced workers are excluded from studies, in part, because of the worker reputation qualifications researchers commonly use. Then, we propose and evaluate an alternative approach to sampling on MTurk that allows researchers to access inexperienced participants without sacrificing data quality. We recommend that in some cases researchers should limit the number of highly experienced workers allowed in their study by excluding these workers or by stratifying sample recruitment based on worker experience levels. We discuss the trade-offs of different sampling practices on MTurk and describe how the above sampling strategies can help researchers harness the vast and largely untapped potential of the Mechanical Turk participant pool.

Journal Article

Share this book

Add to My Shelf

Blockchain and clinical trial : securing patient data

by Jahankhani, Hamid, editor , Kendzierskyj, Stefan, editor , Jamal, Arshad (Associate Dean), editor in Blockchains (Databases) , Data integrity. , Clinical trials Data processing.

\"This book aims to highlight the gaps and the transparency issues in the clinical research and trials processes and how there is a lack of information flowing back to researchers and patients involved in those trials. Lack of data transparency is an underlying theme within the clinical research world and causes issues of corruption, fraud, errors and a problem of reproducibility. Blockchain can prove to be a method to ensure a much more joined up and integrated approach to data sharing and improving patient outcomes. Surveys undertaken by creditable organisations in the healthcare industry are analysed in this book that show strong support for using blockchain technology regarding strengthening data security, interoperability and a range of beneficial use cases where mostly all respondents of the surveys believe blockchain will be important for the future of the healthcare industry. Another aspect considered in the book is the coming surge of healthcare wearables using Internet of Things (IoT) and the prediction that the current capacity of centralised networks will not cope with the demands of data storage. The benefits are great for clinical research, but will add more pressure to the transparency of clinical trials and how this is managed unless a secure mechanism like, blockchain is used\"--Publisher's description.

Book

Share this book

Add to My Shelf

Big Data in Public Affairs

by Rethemeyer, R. Karl , Mergel, Ines , Isett, Kimberley in Big Data , Computer networks , Data analysis

2016

This article offers an overview of the conceptual, substantive, and practical issues surrounding \"big data\" to provide one perspective on how the field of public affairs can successfully cope with the big data revolution. Big data in public affairs refers to a combination of administrative data collected through traditional means and large-scale data sets created by sensors, computer networks, or individuals as they use the Internet. In public affairs, new opportunities for real-time insights into behavioral patterns are emerging but are bound by safeguards limiting government reach through the restriction of the collection and analysis of these data. To address both the opportunities and challenges of this emerging phenomenon, the authors first review the evolving canon of big data articles across related fields. Second, they derive a working definition of big data in public affairs. Third, they review the methodological and analytic challenges of using big data in public affairs scholarship and practice. The article concludes with implications for public affairs.

Journal Article

Share this book

Add to My Shelf

Collecting experiments : making Big Data biology

by Strasser, Bruno J., author in Biology, Experimental Data processing. , Biology, Experimental Databases. , Biological models Data processing.

2019

Databases have revolutionized nearly every aspect of our lives. Information of all sorts is being collected on a massive scale, from Google to Facebook and well beyond. But as the amount of information in databases explodes, we are forced to reassess our ideas about what knowledge is, how it is produced, to whom it belongs, and who can be credited for producing it. Every scientist working today draws on databases to produce scientific knowledge. Databases have become more common than microscopes, voltmeters, and test tubes, and the increasing amount of data has led to major changes in research practices and profound reflections on the proper professional roles of data producers, collectors, curators, and analysts. Collecting Experiments traces the development and use of data collections, especially in the experimental life sciences, from the early twentieth century to the present. It shows that the current revolution is best understood as the coming together of two older ways of knowing--collecting and experimenting, the museum and the laboratory. Ultimately, Bruno J. Strasser argues that by serving as knowledge repositories, as well as indispensable tools for producing new knowledge, these databases function as digital museums for the twenty-first century.

Book

Share this book

Add to My Shelf

The Australian Pharmaceutical Benefits Scheme data collection: a practical guide for researchers

by Mellish, Leigh , Daniels, Benjamin J , Litchfield, Melisa J in Analysis , Australia , Beneficiaries

2015

The Pharmaceutical Benefits Scheme (PBS) is Australia's national drug subsidy program. This paper provides a practical guide to researchers using PBS data to examine prescribed medicine use. Excerpts of the PBS data collection are available in a variety of formats. We describe the core components of four publicly available extracts (the Australian Statistics on Medicines, PBS statistics online, section 85 extract, under co-payment extract). We also detail common analytical challenges and key issues regarding the interpretation of utilisation using the PBS collection and its various extracts. Research using routinely collected data is increasing internationally. PBS data are a valuable resource for Australian pharmacoepidemiological and pharmaceutical policy research. A detailed knowledge of the PBS, the nuances of data capture, and the extracts available for research purposes are necessary to ensure robust methodology, interpretation, and translation of study findings into policy and practice.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter